SF2A 2012 

S. Boissier, P. de Laverny, N. Nardetto, R. Samadi, D. Valls-Gabaud and H. Wozniak (eds) 



BLIND DECOMPOSITION OF HERSCHEL-HIFI SPECTRAL MAPS OF THE NGC 

7023 NEBULA 

Olivier Berne 1 , Christine Joblin 1 , Yannick Deville 1 , Paolo Pilleri 2 , Jerome Pety 3 , David Teyssier 4 , 

Maryvonne Gerin 5 and Asuncion Fuente 2 



Abstract. 

Large spatial-spectral surveys are more and more common in astronomy. This calls for the need of new 
methods to analyze such mega- to giga-pixel data-cubes. In this paper we present a method to decompose 
such observations into a limited and comprehensive set of components. The original data can then be 
interpreted in terms of linear combinations of these components. The method uses non-negative matrix 
factorization (NMF) to extract latent spectral end-members in the data. The number of needed end-members 
is estimated based on the level of noise in the data. A Monte-Carlo scheme is adopted to estimate the optimal 
end-members, and their standard deviations. Finally, the maps of linear coefficients are reconstructed using 
non-negative least squares. We apply this method to a set of hyperspectral data of the NGC 7023 nebula, 
obtained recently with the HIFI instrument onboard the Herschel space observatory, and provide a first 
interpretation of the results in terms of 3-dimensional dynamical structure of the region. 
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Telescopes keep growing in diameter, and detectors are more and more sensitive and made up of an increasing 
number of pixels. Hence, the number of photons that can be captured by an astronomical instrument, in a given 
amount of time and at a given wavelength, has raised significantly thus allowing astronomy to go hyperspectral. 
More and more, astronomers do not deal with 2D images, or ID spectra, but with a combination of both 
of these media giving 3D data-cubes (2 spatial dimensions, 1 spectral dimension). The PACS, SPIRE and 
HIFI instruments, onboard Herschel all have a mode allowing spectral mapping (e.g. van Kempen et al. 201fj : 
Habart et alJl2O10t Joblin et al. 2010h in atomic and molecular lines. Owing to its high spectral resolution, HIFI 
allows to reso lve the profiles of these lin es, enabling to study the ki nematics of e . g. the immediate surrounding 
of protostars ([van Dishoeck et al.ll201ll ). or of star-forming regions ([Pilleri et al.ll2012l) using radiative transfer 
models. 

Although such 3D datasets have become common, there is a lack of methods to analyze the outstanding 
amount of information they contain. Classical analysis methods tend to decompose the spectra by fitting them 
with simple functions (typically mixture of gaussians) but this has several disadvantages: 1) the assumption 
made by the use of a given function usually not based on physical arguments 2) if the number of parameters 
is high, the result of the fit may be degenerate 3) for large datasets and fitting with nonlinear functions, 
the fitting may be very time consuming 4) initial guesses must be provided 5) the spectral fitting is usually 
performed on a (spatial) pixel by pixel basis, so that the extracted components are spatially independent 
whereas physical components are often present at large scales o n the image. Alternat i vely it is possible to 
decompose data-c ubes using Principal Component Analysis (e.g. Ungerechts et al. 1997 : Brunt k, Heveri r2002: 
Brunt et al.| [2009). However this, has the disadvantage to decompose data onto an orthogonal basis, which 
produces components which are difficult to interpret directly in physical terms (e.g. spectra with negative 
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values). An alternative analysis was proposed bv ljuvela et al. ( 1996TK which consists in decomposing spectral 
cubes, in this case spectral maps of the pure rotational lines of carbon monoxide (CO), into the product of a 
small number of spectral components or "end members" and spatial "abundance" maps, with an enforcement 
of positivity for all the points in the decomposition. There is no assumption on spectral properties of the 
components, and hence this can provide deeper insights into the physical structure represented in the data, as 
demonstrated in this pioneering paper. This method uses the positivity constraint of the maps and spectra (all 
their points must be positive) combined with the minimization of a statistical criterio n to derive th e maps and 
spectral components. This method is referred to as positive matrix factorization (PMF. |Paateroll99~4 ). Although 
it contained the original idea of using positivity as a co nstraint to est i mate a matrix product, this work used 
a classical optimization algorithm. Several years later, Lee Sz Seune ( 1999f) introduced a novel algorithm to 
perform PMF using simple multiplicative iterative rules, making the PMF algorithm extremely fast. This 
algorithm is usually referred to as Lee and Seung's Non Negative Matrix Factorization (NMF hereafter) and 
has been widely used in a vast nu mber of applications outsi de astronomy. Although some theoretical aspects 
of this method are still questi oned ( Donoho fc Stodden 20031) , this algorithm has proven its efficiency including 
in astrophysical applications ( Berne et al. 2007t ). However. NMF has several disadvantages: 1) the number of 
spectra to be extracted must be given by the user and is usually hard to guess 2) the error-bars related to the 
procedure are not given automatically. 

In this paper, we present an alternative way of performing a kincmatical study of spectral maps, using a 
method which combines NMF to a classification algorithm and a Monte-Carlo analysis. This approach tends to 
discard the standard drawbacks of NMF described above, thus providing a quasi-optimal decomposition. Here 
we apply this method to Herschel-HIFI spectral maps of the [CII] and 13 CO(8-7) lines at 1900 and 881 GHz 
respectively. We describe the algorithms we use in Sect. 1 and the method itself in Sect. 2, and then apply it 
to the real data. 



1 Definitions and algorithms employed by the method 

1.1 Mathematical description of the data and aims 

In hyperspectral astronomy, the observed data consist ofa3DmxnxI matrix C(p x ,p y , v) where (p x ,p y ) define 
the spatial coordinates and v the spectral index. We assume that all the points in C(p x ,p v ,v) are positive. We 
call spectrum each vector x{p x ,p y ,v) recorded at a position (p x ,p y ) over the / wavelength points. The goal of 
the method that we will describe here is to decompose C into the product of a few (typically < 10) spectra 
and weight maps using the measured noise in the data as the only input to the method and to provide an error 
estimation at each point in the extracted spectra. The main algorithms we use here are Lee & Seung's NMF 
and K-means which we describe hereafter. 



1.2 Lee &. Seung's NMF in our context 

We define a new positive 2D matrix of observations X, the rows of which contain the m x n, spectra of C 
arranged in any order. We now assume that each spectrum, x, is a linear combination of a limited number r 
(with r <§; m x n,) of unknown source spectra, i.e. 

r 

x(Px,P y ,v) = ^a l {p x ,p y )s l (v) +n(p x ,p y ,v), (1.1) 
j=i 

where s l (v) are source spectra, i is the source index, a t (p x ,p y ) are the unknown "weight" coefficients and 
n{p x ,p y ,v) is additive noise. This can be re- written in the following matrix form: 



X = AS + N, 



(1.2) 



where A is the m x n x r matrix of unknown coefficients of the linear combinations and S is an r x I matrix, the 
rows of w hich are the so urce spectra and N is the noise matrix. This is a t ypical blind source separation (BSS) 
problem dCardoscI 19981). and ca n be solved using multiple methods (e.g. Lee fc Seune 1999 . Hvyarinen et al 



200lL IGribonval fc Lesagell2006l) . Here, we use Non-Negative matrix factorization ILee fc SeuneJ (| 19991 ) that ii 



applicable because A and S are positive. The objective is to find estimates of A and S, respectively W and H 
so that 

X w WH, (1.3) 
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This is done by adapting the non- negative matrices W and H so as to minimise the divergence 5(X\WH), 
defined as 

5(X\WH) = E(X«lo g -^-j (1.4) 

ij 

-X ij + (WH) ij ), (1.5) 

where the exponents i and j respectively refer to the row and column indexes of the matrices. The algorithm 
used to achieve the minimization of the divergence is based on the iterative update rule 



V H au X iu /(WH) iu 
W ia <- Wi a U \ (1.6) 



Divergence is non increasing under their respective update rules, so that starting from random W and H 
matrices, the algorithm will converge towards a minimum for 5. This provides the matrix H containing the r 
estimated source spectra hi (in the following wc refer to these as simply " source spectra"). The convergence 
criterion that we have used here measures the evolution of the divergence as a function of the iteration step. 
We have assumed that convergence is reached when 



S i+i 



where k is small, typically 0.0001. 



1.3 K-means 

K-means is a standard unsupervised classification method wich aims to partition a set of vectors into k sets 
a = {0-1,0-2,0-3 ■ ■ ■ o-k} so that within each set, a distance is minimized. In our case, this distance is 1 minus the 
correlation coefficient. Formally this reads: 

argmin^^ 1 — corr(hj, in), (1.8) 

i—l,k hj Gcrj 

where corr stands for correlation coefficient and /ij is the mean spectrum in cluster Oi. Said in a simple way, 
K-means as defined here forms clusters that maximize the correlation between vectors within each cluster. The 
algorithm used to perform K-means is the one provided in Matlab. 



2 Architecture of the method 

This section describes how we use NMF and K-means as well as a Monte-Carlo (MC) analysis to build our 
method. The three distinct steps of this method are: 

• identification of the number of source spectra based on the difference between the estimated power of noise 
and reconstruction residuals 

• estimation of the source spectra using NMF and errors using MC analysis, 

• reconstruction of the weight maps using non-negative least squares. 
These steps are described in the following sections. 
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Fig. 1. Illustration of the results of our method on the Herschel-BIFI data of the NGC 7023 nebula. On the left, the 
Spitzer-IRAG images shows the structure of the nebula at mid-infrared wavelengths (polycyclic aromatic hydrocarbon 
emission). The right part of the figure shows the weight maps (upper panel) and spectral components (lower panel) 
extracted using the NMF-based method described in this paper. Color-scale and intensities in the spectra are in arbitrary 
units (due to the scale uncertainty inherent to blind signal separation methods such as NMF). The error intervals (e(u)) 
as estimated by the method for each source spectrum is shown with the gray envelope (which is very small in the case 
of the 13 CO line). The dashed line indicates A Vlsr = 0. 



2.1 Identification of the number of source spectra 

The identification of the number of sources in our method relies on the estimation of the norm of the noise 
matrix N in the data. In real observations N is unknown. However, it is often possible to get an accurate 
estimation of the power of the noise in the data (this will be described in a subsequent paper, Berne et al. in 
prep.). Let us consider v r ms, which is an estimation of the Frobenius norm of N: 

Vrms ~ \\N\\ F . (2.1) 

In order to identify the number of source spectra, NMF as described above is applied on C starting with the 
minimum number of source spectra r = 2. Once NMF has converged, the matrix of approximated observations 
X is obtained by: 

X = WH. (2.2) 
The norm of residuals (i.e. the difference between original and reconstructed observations) is calculated by: 

$ = \\X-X\\ F . (2.3) 

If 

9 - Vrms < 0, (2.4) 

then the number of sources is r = 2. If the above inequality is not verified then the algorithm retries for r = r + 1 
and such until Eq. (|2.4[) is verified. Note that we do not bring the theoretical proof that $ is decreasing when 
r increases, however, empirical tests on several data-sets (including the one presented in this paper in Sect. [3]) 
show that this is the case. 
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2.2 Monte-Carlo estimation of the source spectra and empirical standard deviation 

Once the number of sources r has been identified we can run NMF with this value of r fixed, for p trials, with 
different initial random matrices for each trial, p is typically 100. For each value trail a set of r source spectra 
are identified. The total number of obtained spectra at the end of this process is hence pxr. The algorithm uses 
K — means to form r sets {o~\, 02, . . .ay} each set containing kx, k2, . . . , k r spectra. Each final source spectrum 
is obtained by averaging the k n spectra in each set <r„. The rows of a new matrix called contain these r 
averaged vectors. The standard deviation at each wavelength v for a given source spectrum is obtained by 



£(^(«)-^)) a , (2.5) 
»=i 

where hi(v) is the value of the source spectrum in a set at a given wavelength, h denotes the average of hi(v) in 
the cluster, e is therefore the empirical sample standard deviation of the value of a point in a source spectrum 
within its cluster. 




2.3 Reconstruction of the weight maps 

The matrix of weights W* is reconstructed by minimizing 

\\X-W.H m \\. (2.6) 

This is done by using the classical non-negative least square algorithm. In the present case we have used the 
version of this algorithm provided in Matlab. 



3 Application to Herschel observations 



3.1 Data 

The Herschel space telescope is an infrared observatory combining a 3.5 meter telescope and 3 instruments. 
Here we have used data from the Heterodyne Instrument for Far Infrared (HIFI) which allows spectral mapping 
at high spectral resolution (v/Av ~ 10 6 ) and high angular resolution (~ 10 — 40") between 480-1250 GHz, and 
1410-1910 GHz, as part of the open tim e program "Physics of gas evaporation at PDR edges" (PI C. Joblin). 
We have observed the NGC 7023 nebula (jFuente et al. 1996 ). where a massive star has blown a cavity inside the 



cloud where it formed. It is illuminated by the young Be star HD200775. With HIFI we have observed the fine 
structure line emitted by the atomic carbon ion (C + ) at 1900 GHz as well as the pure rotational line of carbon 
monoxide isotope 13 CO(8-7) at 881 GHz in spectral mapping. The resulting data consisting in a 3D matrix of 
30x52 spatial positions and 201 spectral points for C + and 18x30 spatial positions and 143 spectral points for 
the 13 CO(8-7) spectral map. 



3.2 Applying the method to our data 

The spectra in this dataset are positive and we assume that non-linear radiative transfer (e.g. re-absorption) 
effects are not predominant which for the C + and 13 CO(8-7) line in such physical environment is acceptable. 
Therefore, the proposed method is applicable. We have applied the method described in Sections [1] and [5] to the 
dataset presented above. The whole method, implemented under Matlab runs typically in a few minutes when 
applied to these data. The resulting spectra and weight maps are shown in Fig. Q] 



3.3 Preliminary results for C + 

Four spectra (and weight maps) are identified by the method. Each spatial and spectral component shows 
distinct features. For clarity, the components can be ordered from 1 to 4 based on their kinematic properties in 
the following way (Fig. [1}: 

• Component 1: lowest peak velocity (about -0.8 km s _1 relative to LSR) 

• Component 2: mid peak velocity (about +0.8 km s _1 relative to LSR) 

• Component 3: high peak velocity (about + 2 km s^ 1 relative to LSR) 

• Component 4: double peaked, very high (+3 km s _1 ) and very low (-2 km s _1 ) velocity relative to LSR 
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3. 4 Preliminary results for 13 CO(8- 7) 

For the 13 CO(8-7) line we are only able to find two components, one being a complete noise map, so in reality 
it appears as a unique 13 CO(8-7) component. This component is shown in Fig.[T] 

3.5 Discussion and conclusions 

A preliminary interpretation of these results can be formulated. It seems that we are observing the expansion of 
a roughly symmetric cavity, roughly centered at the peak position in the map of component 4. At this position, 
we are therefore seeing gas coming towards us and moving away from us at high velocities. The expansion 
velocity can be derived from this component by taking half of the velocity difference between the two peaks 
in the spectrum of component 4, that is about 2.5 km s _1 . The other 3 components seem to correspond to 
the other concentric shells which are expanding at the same velocity, but because of the projection effects their 
resulting absolute velocities as observed with Herschel are smaller than the expansion velocity of 2.5 km s _1 . 
The radius of the shell can be estimated using the map of component 2 and taking the distance between the 
emission peak in the North and the emission peak in the South. This represents an angular size of 60", that is 
R = 3.5 10 15 m in physical scale using a measured distance of 430 parsec. Using this radius and the expansion 
velocity we can derive the age of the nebula to be 4.5 x 10 5 years. We also note that the center of the expanding 
shell does not correspond to the position of the star. This is expected since this star is known to have a large 
proper motion, meaning that it must have moved significantly over the expansion timescale.The exact origin of 
this expanding shell needs to be investigated in de tail, but it is clear t hat its expansion is driven by the massive 
star that has formed within the molecular cloud ( Fuente et al.|[l996h . Several scenarios are envisaged: winds 



and outflow from the star, thermal expansion under irradiation, rocket acceleration of the surrounding cloud 
due to photo-evaporation etc. Finally, the fact that the CO component, which traces denser regions, appear as 
a unique component, is compatible with the idea that the warmer gas, traced by the C + , is subject to much 
more dynamical structure because of evaporation. It could also be that there is only one CO component because 
it traces the edge-on shell which has a higher density than the rest of the expanding shell, that can be seen in 
C + . Further interpretation and modeling is required to take the best advantage of the results obtained here, 
however, astronomical observations in the far infrared at both high angular and high spectral resolutions, can 
be interpreted in terms of a linear combination of few components extracted using an NMF based method. 
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